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^ : ABSTRACT 
O 

:^ 

Ph| We present a fast, efficient and easy to apply computational method for the 

O ' detection of planetary transits in large photometric datasets. The code was 

^ \ specifically produced to analyse an ensemble of 21,950 stars in the globular clus- 

■ ter 47 Tucanae for the photometric signature indicative of a transiting Hot Jupiter 

planet, the results of which are the subject of a separate paper. Using cross cor- 
^ ' relation techniques and Monte Carlo tested detection criteria, each photometric 

c3 ' time-series is compared with a database of transit models of appropriate depth 

and duration. The algorithm recovers transit signatures with high efficiency 
while maintaining a low false detection probability, even in rather noisy data. 
The code has been optimized and with a 3GHz machine is capable of analysing 
and producing candidate transits for 10,000 stars in 24 hours. 

We illustrate our algorithm by describing its application to our large 47 Tuc 
dataset, for which the algorithm produced a weighted mean transit recoverabilty 
spanning 85% to 25% for orbital periods of 1—16 days (half the temporal span 
of the dataset), despite gaps in the time series caused by weather and observing 
duty cycle. The code is easily adaptable and is currently designed to accept 
time-series data produced using Difference Imaging Analysis. 



Subject headings: stars: planetary systems - occultations - methods:data analysis 
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1. Introduction 

The periodic transits of extrasolar planets as they pass across the face of their parent 
star presents an important diagnostic tool for planetary detection and characterisation. The 
method allows a direct measurement to be made of key system parameters, including the 
orbital inclination and the orbital period, and provides the only way at present to directly 
measure the planetary radius. Radius measurements constrain planetary migration history 
and evolution, and also allow comparison of observations to models of planetary atmosphere 
and composition (Burrows et al. 2000; Baraffe et al. 2003; Burrows et al. 2004). When 
coupled with radial velocity measurements, an accurate planetary mass and density can be 
derived (Charbonneau et al. 2000; Brown et al. 2001). Furthermore, the transit method can 
detect planets in regions outside the immediate solar neighbourhood and thus can be used 
to search for planets in distant locations of high stellar density, including globular clusters 
(GiUiland et al. 2000; Weldrake et al. 2005), open clusters (Mochejska et al. 2002, 2004) and 
dense Milky Way starfields (Udalski et al. 2002, 2003). 

The use of transits for planetary detection was first suggested by Struve (1952) and 
studied in detail by Borucki & Summers (1984), but due to technological limitations was not 
implemented until Schneider & Doyle (1995) and Doyle et al. (1996), who used the method 
to search for planets in orbit around the M-dwarf binary CM Draconis. Deeg et al. (2000) 
announced the detection of a variation in eclipse minimum times in this system, which could 
indicate the presence of a third body of planetary mass. Further observations are required 
for confirmation; as yet, no transits in CM Draconis have been observed. 

Interest in planet transits received a considerable boost after the photometric detection 
of HD209458b (Charbonneau et al. 2000; Henry et al. 2000), a short-period Jupiter-mass 

planet in orbit around a solar-type star, spurring many other transit searches in recent years 
(Horne 2003). Transits have a certain geometrical probability (P^eo) of occuring for any 
given star, depending on the stellar radius R* and the orbital semi- major axis (a) via: 



P,eo = ^^^^ ~ (-) (1) 

a a 

where Rp is the planetary radius. Blind transit searches are most sensitive to large planets 
with very small orbital radii, the so-called Hot Jupiter planets. Typical Hot Jupiter transit 
depths are ~0.015 mag (Charbonneau et al. 2000), with a duration of a couple of hours, and 
thus are challenging to detect with data taken in average ground-based conditions, for which 
the transit depth would be comparable to the photometric noise. 
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As a general rule, transit searches involve dedicated small telescopes that monitor wide 
fields of view, with expected transit yields of several per year (see e.g. Hidas et al. (2003). 
An all-sky transit survey has recently been described by Deeg et al. (2004), which could lead 
to the detection of many new planets in the near future. A particularly successful search 
has been undertaken by the OGLElll group (Udalski et al. 2002, 2003), who have already 
identified 137 transit candidates. The nature of the transit signatures is currently being 
determined via vigorous radial velocity follow-up campaigns; of these OGLEIII candidates, 
four objects have been confirmed as planets (Konacki et al 2003; Bouchy et al. 2004; Konacki 
et al. 2004; Pont et al. 2004). Further to these, Alonso et al. (2004) announced the identifi- 
cation of TrES-1, the first planet transit to be identified with the Trans- Atlantic Exoplanet 
Survey. Due to the precise alignment (viewing geometry) required, a large ensemble of stars 
must be sampled in order to detect the fraction of any planets that will transit their host 
as seen from Earth. With such large datascts, in which many thousands of stars must be 
simultaneously sampled, it is necessary to employ a computational detection algorithm that 
can efficiently identify candidate detections. 

We present here a detection algorithm developed and tested to search a dataset of 
21,950 stars in the globular cluster 47 Tucanae for transiting planets, the results of which 
are presented in a companion paper (Weldrake et al. 2005). The code was tested extensively 
to maximize the detection of modeled transits of appropriate depth and duration, while 
keeping the false detection rate to an acceptable level. The result is an effective, fast method 
of transit detection that can be easily modified. 

Section 2 presents an overview of our detection algorithm, along with details of the model 
creation and detection statistic. Section 3 explains the detection criteria that separate real 
candidates from transit-like systematic effects in the presence of noise. Section 4 describes 
the Monte Carlo simulations we have performed to test the transit recoverability rates in 
our dataset and determine the false detection rates. Section 5 presents the summary and 
conclusions. 



2. General Method for Transit Detection 

In order to search for periodic transit signatures in our 47 Tuc dataset, we employed a 
variation of the Matched Filter Algorithm (MFA). This method of signal detection involves 
comparing the data to a series of appropriate models, and was first suggested for use in 
transit searches by Jenkins, Doyle & Cullers (1996). The Matched Filter approach has 
been described in the literature as the best method for use when specifically applied to 
search for transits (Tingley 2003a,b; Kovacs, Zucker, & Mazeh 2002) and has been used 
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for several transit searches, for example, Gilliland et al. (2000) and Bruntt et al. (2003). 

The method, as we have implemented it, assumes a simple square-well transit shape, which 
is a valid assumption when searching for a signal very close to the noise level, for which 
the shape of the short ingress/egress phase of the transit is essentially unresolved. The 
algorithm searches for multiple transits spanning a large range of periods and transit start- 
times within a parameter space determined by characteristics of the dataset. With a total 
observing window of Lyj days, the theoretical upper limit of orbital period to contain three 
detectable (and hence confirmed) transits is L^/ 2 days; we take this as the upper limit of 
the orbital period spanned in the search. 

Superior results will be obtained if lightcurves containing obvious systematic effects 
are removed or the systematic variations reduced before they are submitted to the search 
algorithm. The lightcurves used in our 47 Tuc dataset were produced via difference imag- 
ing analysis, originally described as an optimum point-spread function (PSF) algorithm by 
Alard & Lupton (1998) and later modified by Wozniak (2000). Difference imaging analysis 
automatically removes many of the unwanted systematic effects caused by PSF variations 
over the observational time span, and is particularly useful in crowded stellar fields. Other 
methods to reduce the contamination caused by systematic effects are contained within our 
algorithm itself and are described in Section 3. Throughout, we will use results from our 47 
Tuc dataset (Weldrake et al. 2004), hereafter WSBFa, to illustrate the detection algorithm. 



2.1. Model Creation and Detection Statistic 

We begin by producing a large number {Nmod) of model transit lightcurves with which 
to compare the data. Each model has a transit period of Pmod days, a transit depth of Dmod 
magnitudes, and a transit duration of dmod hours. These values are chosen within a range of 
Hot Jupiter transit depths and durations expected to be detectable in the dataset. Further 
to these parameters, each model is characterised by Tghift, the time from the first data point 
to the beginning of the first transit. 

As an example, Fig.l shows a transit model with Pmod = 3.15d, Dmod = 0.02 mag, and 
dmod — 2 hours covering the whole of a 30.4 day observing window. This Pmod is typical 
of the most common Hot Jupiter planet orbital period from the sample currently known in 
the Solar Neighbourhood^, and is used as an example to illustrate the visibility of such a 
transit in a perfect dataset. In this illustration there are no gaps in temporal coverage, which 
would be obtained only in perfect weather conditions experiencing no diurnal effects. 



From the Extrasolar Planet Encyclopedia littp://www.obspm.£r/encycl/encycl.litnil 
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The total number of models, Nmod-, is the product of the total number of period steps, 
Ly^l 2APmodi where AP^od = dmod/2 and the total number of transit start time steps, 
Pmod/Tshift- The net effect of having these Arghift steps is to slide each model (with a par- 
ticular Pmod) across the data. The statistic we use to compare the model lightcurves to the 
data and test for the presence of transits is a cross- correlation function, C {P mod iT shift) i of 
the form: 



n 

C{Pmod, Tshift) 

where the obtained lightcurve consists oi i = 1 ^ n points; C {Pmod, Tshift) is the value of 
the cross correlation function for a transit model with period Pmod beginning at time Tshift, 
D{ti) is the deviation (in A magnitudes) of the data from the mean value of the lightcurve 
at the observational time ij, M{Pmod,Tshift + ^i) is the deviation (in A magnitudes) of the 
model at this same time, and At is taken to be Arghift- The output of this detection statistic 
is therefore a total series Nmod values of C {Pmod, Tshift) per lightcurve. Note that by taking 
At = ATshift, we are weighting every data point equally in the cross correlation. A discussion 
on the inclusion of non-equal weighting shall be made in section 3. 

The time series data {i — 1 ^ n) are first shifted so that their mean equals zero, 
allowing a direct comparison between the data and each transit model (which has out- 
of-transit points set to zero). Note that only data points lying inside the model transit 
are used for the analysis, as, by definition, points outside transit arc zero in the model 
M{Pmod, Tshift + ti) = and thus do not contribute to C{Pmod, Tshift), saving computational 
time. Models with short periods have more points inside transit and hence have higher 
significance detections than those with longer periods. If a simulated transit is added to the 
actual lightcurve data, a C{Pmod, Tguft) will be a miximum for the model that best represents 
the transit. 

In summary, each lightcurve is compared to a total of Nmod transit models, spanning 
a period {Pmod) range of 1 — days, with APmod steps of dmod/2, and a Tshift range 
from 1— > Pmod- The step size ATshift should be chosen with regard to the temporal sampling 
particular dataset, and is best chosen to be 1/2 the transit duration or smaller. 

This total database of models and their associated C {Pmod, Tshift) values for each lightcurve 
allow us to completely search an individual lightcurve for any transit-like feature that could 
theoretically be contained within it, while optimising the time needed for the code to run. 
By calculating the root- mean-square scatter, (Jjv„„^, of the output C{Pmod, Tshift) points over 
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all models, a measure can be made of the significance S{Pmod,Tshift) for a given lightcurve. 
Namely, 



C/ D ^ ^ — ^i^rnod^'^ shift) /g\ 
^\^mod-,T^shift) = 

where C {Pmod, t shift) is the value of the cross-correlation function of a given model and ctjv^^^ 
is given by: 



-| 2 

C{Pmod, 'T shift) ~ C{Pmod-> 'T shift) 

■^mod 



where C{Pmod, ^sUft) is the mean cross-correlation value over all models for a given lightcurve. 
Put more simply, this measure converts the raw C{Pmod, TsUft) values from their original units 
into multiples of the total rms scatter of the C{Pmod, Tshift) distribution, providing an output 
that is easier to interpret. 



ift 



2.2. Application to the 47 Tuc Dataset 

For the 47 Tuc dataset (WSBFa), we adjusted the values of the code parameters dis- 
cussed earlier for speed and maximum recovery. First, our total observing window is 30.4 
days, giving an upper limit to the Hot Jupiter orbital period of 15.2 days for secure detection. 
Fig. 2 shows the actual 47 Tuc dataset temporal coverage with the same transit model as in 
Fig.l, and highlights the serious adverse effect of diurnal observing on transit visibility. To 
analyse the 47 Tuc dataset, we used 1501 APmod steps, each of 0.01 days increment, and 
755 ATshift increments, each of 0.04 days, yielding a total Nmod of 1,133,255 multiple transit 
models to compare to each time series. 

Fig. 3 shows a synthetic transit (with the same Pmod, Dmod and dmod as the model 
presented in Figs.l and 2) added to an actual randomly-selected lightcurve from the 47 Tuc 
dataset with 0.015 magnitude photometric uncertainty. The transit is very easily seen in the 
data. When phase- wrapped at the peak detected period (3.15d, the actual true period), the 
result is Fig. 4. The transit is clearly seen at phase $ = and 1. 
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An example of a transit added to a 47 Tuc star of poor photometric quality (0.035 mag 

rms) is shown in Fig. 5 and is phase-wrapped in Fig.6. The model has the same Pmod and 
dmod but a transit depth Dmod of 0.03 magnitude. The code detected this transit successfully, 
with a high significance, which is impressive considering the noise level of the data. 

Fig.7 displays the cross correlation output for the models presented in Figs. 3 and 5 in 
order to better illustrate the detection method. The detection significance, S (Pmod , Tshift) , 
is indicated for both modeled transits for the true transit period, Ptran, and twice the true 
period, 2Ptran- The points show the output significance of these two models with a correct 
Ptran and correct Tgh^jt (CPCT), correct Ptran with incorrect Tghift (CPIT) and incorrect Ptran 
with correct Tshift (IPCT). It is clear that the correct model has the highest significance, and 
the transit can be seen visually when phase-wrapped to this period. 

Fig.8 shows the significance values of models tested against a Ptmn =1.149d transit with 
a depth of 2% (0.02 mag) added to a (random) actual 0.015 magnitude uncertainty lightcurve 
from our 47 Tuc data. An extremely strong detection with S {Pmod, Tshift) = 22 is found at 
the true Ptmn although aliasing is apparent at integer multiples of this period. The bottom 
half of the figure shows the ouput for the same lightcurve but with no transit added. Only 
three data points are found above the detection criterion S {Pmod, Tshift) > Scr that we have 
placed and describe in the next section. These three points are caused by random scatter in 
that lightcurve. 

3. Detection Criteria in the Presence of Noise 

In order to separate real detections from transit-like false detections caused by varia- 
tions inherent to the data, detection criteria are required. Both statistical and systematic 
variations can produce high significance S{Pmod, Tshift) that do not correspond to actual real 
transits. (See, for example Fig.8). Consequently, the first detection criterion is a lower limit 
on S {Pmod, Tshift)] thc higher S {Pmod, Tshift) , the better is the correlation between the model 
and the data. However, this single criterion is not sufficient, as many lightcurves have sys- 
tematic effects that lead to spuriously high S {Pmod, Tshift) values, producing points that lie 
in the detection regime. A further criterion is required to reduce these spurious detections. 

Due to aliasing effects a real detection of multiple transits will yield a larger number of 
different models with S{Pmod, Tshift) that satisfy the first criterion. This has been verified by 
testing the dataset. Therefore the second criterion involves measuring the actual number of 
models, iVp, that lie above the first S {Pmod, Tshift) detection criterion. 

As well as the S {Pmod, Tshift) and Np criteria, false detection rates can be reduced by 
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considering the uncertainty of the photometry. A very small number of strongly discrepant 
points in the time-series with small formal errors can account for a large portion of false 
detections. We have found that the majority of these spurious points are caused by bad 
columns and other CCD blemishes, or by suddenly differing observing conditions. As long 
as the deviation of these discrepant points is larger than would be expected for a transit, we 
reset the D{ti) value of that point to the mean of the data. The consequences of this shall 
be discussed further when considering the application of these criteria to the 47 Tuc dataset. 

The detection process is complicated by the varying observational conditions typical 
of long time scries of photometric data. These can produce pscudo periodic signals with 
an associated increase in photometric measurement errors. In order to reduce these effects, 
the contribution of each datapoint before it was used in the C{Pmod)'Tshift) calculation was 
weighted by the size of its photometric uncertainty, with the standard weighting scheme: 



where Wi is the point weight and (jj is the errorbar associated with the i point. The result 
is that points with large errorbars are given a small weight and hence do not add significantly 
to the final C{Pmod,Tshift) foi' that model. By incorporating both of the S{Prnod-,'Tshift) and 
Np detection criteria, and incorporating the outlier removal and Wi weighting scheme, the 
real detections and false detections in the time-series were kept to acceptable levels. 



We now discuss the application of these criteria to the 47 Tuc dataset, and illustrate 
their effect on the final transit candidate lists. The data were split into two bins for separate 
analysis, distinguishing lightcurves with relatively low photometric scatter (<0.02 mag), and 
those with somewhat higher scatter (0.02<rms<0.04 mag). We found that applying slightly 
different values of the detection criteria to the two bins we could keep the detection levels 
high and false detection levels low. 

We performed Monte Carlo tests, adding model transits of various depths and dura- 
tions to actual dataset hghtcurves with differing photometric uncertainties to determine the 
maximum photometric scatter for which the algorithm could reasonably detect transits with 
depths as large as Dtran — 0.03 mag. The expected depth of a transit is dependent upon 
stellar magnitude and this value is the expected transit depth for stars at the lower limit of 




(5) 



3.1. 



Detection Criteria for the 47 Tuc Dataset 
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our search range, as described in Weldrake et al. (2005). Stars with scatter greater than 0.04 
mag were found to suffer from large false detection rates and an unacceptably low transit 
recoverability rate. With this lower limit, the total number of lightcurves with rms <0.04 
mag in the 47 Tuc dataset is 21,950, allowing a statistically robust sample for analysis. 

For the better quality lightcurves, only those models with S{Pmod,Tshift) > Scr =6.5 
(marked by a dotted line in Fig.8) are passed onto the second detection criteria. For poor 
quality stars in the second bin, this criterion is Scr > 7.0. The second criterion Np was then 
calculated, and only those lightcurves with A'p>60 and >50 points, for the high and low 
quality bins, respectively, were considered further. These specific numbers for the detection 
criteria were determined via a trial and error process, where the numbers were altered and 
the subsequent results noted. The final choice was made by maximizing the transit detection 
while keeping false detection rates to a low level. 

We chose to remove outliers outside 3.5 times the rms lightcurve scatter. This could 
remove real transits with depths Aran > 0.035 mag in our best quality main sequence 
lightcurves (rms ~0.01 mag). However, this depth is too deep for a planetary transit although 
it could be caused by a binary star, in which case it would have already been detected by 
the Lomb-Scargle periodogram algorithm which was used to identify variable stars in the 47 
Tuc field (Weldrake ct al. 2004). On our poor quahty hghtcurves, this depth is larger than 
an expected planetary transit. 

By incorporating the Wi weighting scheme into the calculation of the C{Pmod,Tshift), 
the number of detections caused by systematic trends was reduced by an order of magnitude. 
Interestingly, it was found that this weighting scheme only reduces the false detection rates 
for lightcurves of poor photometric quality. The photometric uncertainty of poor quality 
lightcurves is strongly dominated by photon noise, which the photometry code can accurately 
estimate. The point-to-point variations in the fractional photometric uncertainty of bright 
stars, however, is dominated by systematic effects (such as PSF fitting), which are not 
accurately refiected in the errorbars returned by most photometry codes. The weighting 
scheme was used, therefore, only for stars in the lower quality bin. 

4. Transit Recoverability and False Detection Rates 

In order to fully understand the ability of the code to detect transit-like events, extensive 
Monte Carlo simulations involving modeled transits must be carried out. The limits to 
which the code can detect transits is a vital part of determining the expected number of 
planets that potentially could be harvested in any particular dataset, and vital information 



-lo- 



in placing robust statistical significance on results. With this recoverability knowledge, the 
false detection rates in the actual dataset can be estimated, which give an indication of 
the number of candidate detections that will need to be visually examined. For speed of 
subsequent examination, this false detection rate should be minimized as much as possible. 

We stress that using an actual dataset lightcurve instead of a simulated one to carry out 
the Monte Carlo transit recoverability tests is required. This produces an accurate estimate 
of the sensitivity of a given algorithm, as the exact temporal sampling and photometric 
accuracy of the data is reflected in the tests. In summary, many thousands of such time- 
series transits arc produced, and added to actual lightcurvcs typical of the dataset. The code 
is then run on all these synthetic transits to see how well the algorithm recovers them with 
the given detection criteria, allowing a determination of the transit recoverability Rtran- The 
detection criteria can be changed slightly and is subsequently optimised for a given dataset. 

The Monte Carlo can also be used to estimate the false detection rate. A false detection 
is defined as a candidate lightcurve which passes all the detection criteria, yet does not 
contain a model transit. Systematic effects account for the vast majority of these detections, 
specifically caused by data points at the beginning and end of a night, by cloudy weather 
and other spurious terrestrial effects. False detections may also be caused by the statistics of 
the dataset, as there will always be points that trigger a detection due to random statistics 
in very large datasets. A further factor to be considered is the effect of stellar crowding 
on photometry quality. Even with difference imaging techniques, bright stellar companions 
can seriously degrade the ability of the photometric pipeline to fit a reasonable point spread 
function to the fainter stars resulting in systematic deficiencies in lightcurve photometry that 
can easily mimic the appearance of a transit. 

False detection rates were determined by running the algorithm with fixed detection 
criteria on the actual 47 Tuc dataset, and seeing how many "detections" the code found. 
These would contain both real and false detections. The "real" transit occurrence rate is 
expected to be 1/1,250 (from Weldrake et al. (2005)) and by dividing the transit "occurrence" 
in the code by this number, a value can be estimated for the false detection rate. This value 
should be minimised in these tests so that the number of candidates for inspection is reduced, 
yet the number of true detections recovered is large. 

4.1. Recoverability and False Detection Rates in the 47 Tuc dataset 

For the 47 Tuc dataset, a large number of modeled transits were produced in order to 
test the transit recoverability for the two photometric bins. As the expected depths and 
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durations of planetary transits change as the stellar radius changes along the main sequence 
of the cluster, a number of different models were produced and tested against the code. 
A database of 1,464 model transits was produced for a range of orbital periods (60 steps 
covering Pmod =1-15 — > 16.15 day, of 0.25d increment), and model transit start times (r^od = 
24 steps of 0.5d increment) ranging from the middle to the end of the dataset, for each chosen 
transit depth and duration to be tested. With four depths-durations combinations, the total 
database of modeled transits tested with the algorithm numbered 5,856. 

For stars in the higher photometric quality bin, model transits were superimposed on a 
randomly-selected lightcurvc with 1% (0.01 mag) photometric scatter. A dataset was made 
comprising 1464 models with D^o^ = 0.01 mag and a duration dmod = 2.5 hours, and another 
1464 with a i^mod = 0.02 mag and the same duration. The code was then applied to these 
two model transit datasets, employing the same candidate selection criteria as described in 
the previous Section. The resultant transit recoverabihty is shown in Fig. 9. The solid hne 
denotes the ability of the algorithm to recover -Dmod = 0.02 mag transits, and the dotted 
line Djnod = 0.01 mag. The mean of these two recoverabilities is plotted as a lighter shading 
dotted line. On this scale, a recoverabihty of 0.5 indicates that half the model transits were 
detected by the code. A recoverabihty dropoff is seen as period increases, as these transits 
have less in-transit points to sample, and have a lower probability of displaying more than 
one visible transit in the data. Fig. 9 indicates that the transit recoverabihty is very good 
for these high photometric quality lightcurves over the whole of the sampled orbital period 
range, with the scatter being due to effects caused by the dataset temporal sampling. 

Fig. 10 shows the corresponding recoverabihty rate for the bin of poorer quality lightcurves, 
for expected transits of depth Pmod — 0.025 mag and 0.03 mag assuming a duration of dmod = 
2 hours. The solid hne indicates the recoverabihty for a D^od = 0.025 mag transit added to a 
hghtcurve with rms = 0.025 mag, and the dark shaded dotted hne indicates the recoverabil- 
ity for a Dmod — 0.03 mag transit added to a lightcurve with 0.035 mag scatter. Again, for 
both of these simulations the recoverabihty is good over all sampled orbital periods, ranging 
from ~80% at Ptran = 1 day to ~20% at 16 days. 

Fig. 11 incorporates both of the mean recoverabilities derived from all of the simulations, 
weighted by the numbers of stars in each bin as an indication of the total statistical recov- 
erabihty in the 47 Tuc dataset. This recoverabihty was subsequently used to calculate the 
expected numbers of planets, for a given assumption about planet frequency that should be 
detected in our data as a whole, as presented in Weldrake et al. (2005). 

In the better quality bin, it was found that, assuming a planet frequency in 47 Tuc similar 
to that in the Solar Neighbourhood, a real transit detection should have been identified for 
every 15 (true and false) detections the code passed, using both the crjv^^^ and Np detection 
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criteria described earlier. That is, given the number of stars in this bin, 3-4 real transits 
should be present in a set of 60 candidates, a false detection rate of 15-to-l. Similarly, in the 
poorer quality bin for which the Wi weighting scheme was incorporated, the false detection 
rate was 10-to-l. Although this could have been reduced further by tightening the detection 
criteria, this was found to impinge too unfavourably on the transit recoverability. Given 
the number of stars in the second bin, the 50 returned "detections" should include 4-5 real 
planetary transits if 47 Tuc were hke the Solar Neighbourhood. 

It is important to note that the procccdurc used to weed out false positive detections 
could conceivably include disregarding real transits. It was found by visual inspection of 
randomly selected model transits that the signature is clearly visible on the data and has 
an extremely high detection significance compared to the systematic false positives. It is 
not expected that the transit recoverability would be significantly impinged by this visual 
inspection method. 

The majority of false detections were caused by crowding and are immediately identified 
by plotting the lightcurve, which displays systematic features occuring at the same times 
for most false detections. The problem was less severe in the uncrowded outer regions of 47 
Tuc, where the false detection rates were 1/3 the values presented above. Fig. 12 shows the 
period distribution of all lightcurvcs that pass the detection criteria and as such is the period 
distribution of the false detections that were common in the dataset. A few periods have 
a significant excess of detections over the general trend. In particular Pmod = 1-5 days and 
Pmod = 4.48 days show a large excess of "detections" . As it is more difficult for systcmatics 
to produce a strong detection when the number of in-transit data points is reduced, the 
numbers of false detections decreases as period increases. 

The code is quick and easy to apply. Application to our whole 47 Tuc dataset was ac- 
complished in 16 steps on four 3057MIIz 4096Mb RAM, i86pc machines. The code completed 
the task in 12 hours, and is easily modifiable to run on existing datasets. 

5. Summary and Implications for the 47 Tuc Transit Search 

We have presented and described a quick, efficient and easy to apply computational 
method for the detection of planetary transits in large photometric time-series datasets. 
Using a cross correlation function the code compares each sampled lightcurve with a database 
of model transits of appropriate transit depth and duration. A detection is implied by 
a significantly high value of the correlation distribution. Monte Carlo simulations using 
the actual temporal sampling and photometric characteristics of the data superimposed on 
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appropriately modeled transits, show an excellent weighted mean recoverability rate over 
the whole of the sampled period range with a relatively low false detection probability. In 
particular the code achieves a very good recoverability when searching for transits at or just 
below the photometric noise level of the data. The code is easily adaptable to run on existing 
datasets to search for the same photometric signals, and is capable of testing 10,000 stars in 
24 hours with a single 3057MHz machine. 

Our algorithm was exclusively developed and tested for use to search an ensemble of 
21,950 stars in 47 Tucanac for the presence of Hot Jupiter planets with orbital periods in 
the range of 1 — 16 days. Despite a detailed search, no transits were found (Wcldrake et al. 
2005), yielding a null detection with high significance when compared to the frequency of 
Hot Jupiters in the Solar Neighbourhood. 
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Fig. 1. — An example model transit {Pmod = 3.15 days, Dmod = 0.02 mag) covering the whole 
of our observing window with no diurnal or weather induced gaps. This shows the ability of 
a perfect dataset to contain many detectable transits for a typical Hot Jupiter planet. 
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Fig. 2. — The same model transit as seen in Fig.l but with the points shown only for the 
actual observational times of our 47 Tuc dataset. Gaps caused by weather and daylight 
eliminate many model transits. Despite this, many detectable transits remain. 
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Fig. 3. — A model transit as shown in Fig.2 sampled at the actual temporal sampling of our 
data and added to an actual lightcurve from our dataset with rms photometic uncertainty of 
0.015 mag. The transit is clearly seen at several places, and the errorbars give an indication 
of the photometric quality at each observation. 
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Fig. 4. — The lightcurve shown in Fig. 3, now phase-wrapped to the model period of highest 
significance {Pmod =3.15d) as determined by our detection algorithm. The transit, seen at 
$ = and 1, gives an excellent indication of transit visibility in our 47 Tuc dataset. 
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Fig. 5. — The same model Pmod =3.15d transit of Fig.4 but with a Dmod =0.03 mag, now 
superimposed on the hghtcurve of a star with rms uncertainty 0.035 mag, typical of the 
worst quality data searched in the 47 Tuc dataset. Although the transit occurs at the same 
places as that in Fig. 3, it is much more difficult to see visually. 
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Fig. 6. — The lightcurve of Fig. 5 is phase- wrapped to the model period yielding the highest 
significance. The transit is barely visible at phase $ = and 1. Nevertheless, during the 
Monte Carlo simulations, the code detected the transit in this lightcurve at the correct period 
with 9.4(7 significance. 
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Fig. 7. — The output of the code, using the two hghtcurves shown in Figs. 3 and 5 
examples. The points show the significance of the detection for both lightcurves at their 
respective periods, P, and integer multiples of that period. The statistic S{Pmod, '''shift) is 
shown at the positions for the correct Pmod with correct Tghift (CPCT), correct Pmod incorrect 
Tshift (CPIT) and incorrect P^od correct Tshift (CPIT). 
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Fig. 8. — The total output of the transit finding code. The top panel shows the output 
with a Pfran = 1-149 day 2% (0.02 magnitude) depth transit added to an actual dataset 
lightcurve with a photometric rms scatter of 0.015 mag. The bottom panel shows the same 
lightcurve but without the transit. The detection is clear, producing a 22a spike in the output 
C{P mod ■,'T~ shift) valucs, the actual true period having the highest significance detection, with 
aliasing apparent at integer multiples of the true period, which are marked. The dotted line 
indicates the Scr > 6.5 detection criterion, as described in the text, and as applied to the 



lightcurves in the bin of higher quahty. 
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Fig. 9. — The recoverability of Dmod =0.01 mag and 0.02 mag transits with duration dmod = 
2.5 hours as a function of orbital period for stars with rms scatter <0.02 mag. The solid line 
indicates the recoverability of 0.02 mag transits, whereas the dotted line indicates recover- 
ability of 0.01 mag transits. The light dotted line is the mean recoverability of these two 
simulations. 
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Fig. 10. — The transit recoverability for the second of our dataset bins, stars with rms scatter 
in the range 0.02-0.04 mag. The grey dotted line indicates the average recoverability for this 
bin. 
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Fig. 11. — The weighted mean transit recoverability as a function of orbital period in our 
dataset. This is determined via the mean recoverabihties as plotted in Figs. 9 and 10 and 
incorporates the number of stars in each of the two magnitude rms bin which shall be sampled 
in our dataset. This recoverability was used to determine the expected number of planets 
that should be seen in our 47 Tuc dataset. 
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Fig. 12. — The normalised period distribution of all the models, after running the code on the 
whole dataset, which satisfy the S{Pmod,Tshift) > Scr detection criterion. Both photometric 
quality bins are plotted. A few periods (in particular P=4.48 days) show many systematic 
false detections. 



